Kingston upon Thames
The Year in Slop
This was the year that A.I.-generated content passed a kind of audiovisual Turing test, sometimes fooling us against our better judgment. The Turing test, a long-established tool for measuring machine intelligence, gauges the point at which a text-generating machine can fool a human into thinking it's not a robot. ChatGPT passed that benchmark earlier this year, inaugurating a new technological era, though not necessarily one of superhuman intelligence . More recently, however, artificial intelligence passed another threshold, a kind of Turing test for the eye: the images and videos that A.I. can produce are now sometimes indistinguishable from real ones. As new, image-friendly models were trained, refined, and released by companies including OpenAI, Meta, and Google, the online public gained the ability to instantly generate realistic A.I. content on any theme they could imagine, from superhero fan art and cute animals to scenes of violence and war.
State of play and future directions in industrial computer vision AI standards
Stefanidou, Artemis, Radoglou-Grammatikis, Panagiotis, Argyriou, Vasileios, Sarigiannidis, Panagiotis, Varlamis, Iraklis, Papadopoulos, Georgios Th.
The recent tremendous advancements in the areas of Artificial Intelligence (AI) and Deep Learning (DL) have also resulted into corresponding remarkable progress in the field of Computer Vision (CV), showcasing robust technological solutions in a wide range of application sectors of high industrial interest (e.g., healthcare, autonomous driving, automation, etc.). Despite the outstanding performance of CV systems in specific domains, their development and exploitation at industrial-scale necessitates, among other, the addressing of requirements related to the reliability, transparency, trustworthiness, security, safety, and robustness of the developed AI models. The latter raises the imperative need for the development of efficient, comprehensive and widely-adopted industrial standards. In this context, this study investigates the current state of play regarding the development of industrial computer vision AI standards, emphasizing on critical aspects, like model interpretability, data quality, and regulatory compliance. In particular, a systematic analysis of launched and currently developing CV standards, proposed by the main international standardization bodies (e.g. ISO/IEC, IEEE, DIN, etc.) is performed. The latter is complemented by a comprehensive discussion on the current challenges and future directions observed in this regularization endeavor.
Applied Federated Model Personalisation in the Industrial Domain: A Comparative Study
Siniosoglou, Ilias, Argyriou, Vasileios, Fragulis, George, Fouliras, Panagiotis, Papadopoulos, Georgios Th., Lytos, Anastasios, Sarigiannidis, Panagiotis
The time-consuming nature of training and deploying complicated Machine and Deep Learning (DL) models for a variety of applications continues to pose significant challenges in the field of Machine Learning (ML). These challenges are particularly pronounced in the federated domain, where optimizing models for individual nodes poses significant difficulty. Many methods have been developed to tackle this problem, aiming to reduce training expenses and time while maintaining efficient optimisation. Three suggested strategies to tackle this challenge include Active Learning, Knowledge Distillation, and Local Memorization. These methods enable the adoption of smaller models that require fewer computational resources and allow for model personalization with local insights, thereby improving the effectiveness of current models. The present study delves into the fundamental principles of these three approaches and proposes an advanced Federated Learning System that utilises different Personalisation methods towards improving the accuracy of AI models and enhancing user experience in real-time NG-IoT applications, investigating the efficacy of these techniques in the local and federated domain. The results of the original and optimised models are then compared in both local and federated contexts using a comparison analysis. The post-analysis shows encouraging outcomes when it comes to optimising and personalising the models with the suggested techniques.
A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection
Li, Vladislav, Tsoumplekas, Georgios, Siniosoglou, Ilias, Argyriou, Vasileios, Lytos, Anastasios, Fountoukidis, Eleftherios, Sarigiannidis, Panagiotis
Current methods for low- and few-shot object detection have primarily focused on enhancing model performance for detecting objects. One common approach to achieve this is by combining model finetuning with data augmentation strategies. However, little attention has been given to the energy efficiency of these approaches in data-scarce regimes. This paper seeks to conduct a comprehensive empirical study that examines both model performance and energy efficiency of custom data augmentations and automated data augmentation selection strategies when combined with a lightweight object detector. The methods are evaluated in three different benchmark datasets in terms of their performance and energy consumption, and the Efficiency Factor is employed to gain insights into their effectiveness considering both performance and efficiency. Consequently, it is shown that in many cases, the performance gains of data augmentation strategies are overshadowed by their increased energy usage, necessitating the development of more energy efficient data augmentation strategies to address data scarcity.
Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting
Tsoumplekas, Georgios, Siniosoglou, Ilias, Argyriou, Vasileios, Moscholios, Ioannis D., Sarigiannidis, Panagiotis
The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance, in a federated learning setting is proposed. Specifically, the goal of the proposed method is to enhance model performance for cardiovascular disease prediction by tackling the class-imbalance that typically characterizes datasets used for this purpose, as well as by leveraging patient data available in different nodes of a federated ecosystem without compromising their privacy and enabling more resource sensitive allocation. The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients, achieving improved performance. Meanwhile, its robustness under various hyperparameter settings, as well as its ability to adapt to different resource allocation scenarios, is verified.
Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study
Asimopoulos, Dimitris, Siniosoglou, Ilias, Argyriou, Vasileios, Goudos, Sotirios K., Psannis, Konstantinos E., Karditsioti, Nikoleta, Saoulidis, Theocharis, Sarigiannidis, Panagiotis
In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from Language Models (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional language models and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches
Asimopoulos, Dimitris, Siniosoglou, Ilias, Argyriou, Vasileios, Karamitsou, Thomai, Fountoukidis, Eleftherios, Goudos, Sotirios K., Moscholios, Ioannis D., Psannis, Konstantinos E., Sarigiannidis, Panagiotis
In the realm of data privacy, the ability to effectively anonymise text is paramount. With the proliferation of deep learning and, in particular, transformer architectures, there is a burgeoning interest in leveraging these advanced models for text anonymisation tasks. This paper presents a comprehensive benchmarking study comparing the performance of transformer-based models and Large Language Models(LLM) against traditional architectures for text anonymisation. Utilising the CoNLL-2003 dataset, known for its robustness and diversity, we evaluate several models. Our results showcase the strengths and weaknesses of each approach, offering a clear perspective on the efficacy of modern versus traditional methods. Notably, while modern models exhibit advanced capabilities in capturing con textual nuances, certain traditional architectures still keep high performance. This work aims to guide researchers in selecting the most suitable model for their anonymisation needs, while also shedding light on potential paths for future advancements in the field.
Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings
Tsoumplekas, Georgios, Li, Vladislav, Siniosoglou, Ilias, Argyriou, Vasileios, Goudos, Sotirios K., Moscholios, Ioannis D., Radoglou-Grammatikis, Panagiotis, Sarigiannidis, Panagiotis
In the ever-evolving era of Artificial Intelligence (AI), model performance has constituted a key metric driving innovation, leading to an exponential growth in model size and complexity. However, sustainability and energy efficiency have been critical requirements during deployment in contemporary industrial settings, necessitating the use of data-efficient approaches such as few-shot learning. In this paper, to alleviate the burden of lengthy model training and minimize energy consumption, a finetuning approach to adapt standard object detection models to downstream tasks is examined. Subsequently, a thorough case study and evaluation of the energy demands of the developed models, applied in object detection benchmark datasets from volatile industrial environments is presented. Specifically, different finetuning strategies as well as utilization of ancillary evaluation data during training are examined, and the trade-off between performance and efficiency is highlighted in this low-data regime. Finally, this paper introduces a novel way to quantify this trade-off through a customized Efficiency Factor metric.
Toward Green and Human-Like Artificial Intelligence: A Complete Survey on Contemporary Few-Shot Learning Approaches
Tsoumplekas, Georgios, Li, Vladislav, Argyriou, Vasileios, Lytos, Anastasios, Fountoukidis, Eleftherios, Goudos, Sotirios K., Moscholios, Ioannis D., Sarigiannidis, Panagiotis
Despite deep learning's widespread success, its data-hungry and computationally expensive nature makes it impractical for many data-constrained real-world applications. Few-Shot Learning (FSL) aims to address these limitations by enabling rapid adaptation to novel learning tasks, seeing significant growth in recent years. This survey provides a comprehensive overview of the field's latest advancements. Initially, FSL is formally defined, and its relationship with different learning fields is presented. A novel taxonomy is introduced, extending previously proposed ones, and real-world applications in classic and novel fields are described. Finally, recent trends shaping the field, outstanding challenges, and promising future research directions are discussed.
Monitoring Machine Learning Forecasts for Platform Data Streams
Data stream forecasts are essential inputs for decision making at digital platforms. Machine learning algorithms are appealing candidates to produce such forecasts. Yet, digital platforms require a large-scale forecast framework that can flexibly respond to sudden performance drops. Re-training ML algorithms at the same speed as new data batches enter is usually computationally too costly. On the other hand, infrequent re-training requires specifying the re-training frequency and typically comes with a severe cost of forecast deterioration. To ensure accurate and stable forecasts, we propose a simple data-driven monitoring procedure to answer the question when the ML algorithm should be re-trained. Instead of investigating instability of the data streams, we test if the incoming streaming forecast loss batch differs from a well-defined reference batch. Using a novel dataset constituting 15-min frequency data streams from an on-demand logistics platform operating in London, we apply the monitoring procedure to popular ML algorithms including random forest, XGBoost and lasso. We show that monitor-based re-training produces accurate forecasts compared to viable benchmarks while preserving computational feasibility. Moreover, the choice of monitoring procedure is more important than the choice of ML algorithm, thereby permitting practitioners to combine the proposed monitoring procedure with one's favorite forecasting algorithm.